AITopics | implicit attention

Collaborating Authors

implicit attention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reviews: Multimodal Residual Learning for Visual QA

Neural Information Processing SystemsJan-20-2025, 16:25:39 GMT

The authors successfully built upon two effective ideas, the deep residual learning and element-wise multiplication for implicit attention, and created a solution for general multi-modal tasks. Experiments were carefully run to select an optimal architecture and hyper-parameters for the targeted Visual QA task. The results appeared to be superb, compared to previous studies with various deep learning techniques. It would be helpful if the authors can present additional comparison with existing techniques in terms of model parameter size, as well as amount of data required for learning. It would also be interesting to separately assess the value of residual learning and implicit attention on the Visual QA task, to help understand which aspect is the most critical.

implicit attention, multimodal residual learning, visual qa, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models

Zimerman, Itamar, Ali, Ameen, Wolf, Lior

arXiv.org Artificial IntelligenceMay-26-2024

Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs, all featuring sub-quadratic complexity in sequence length and excellent scaling properties, enabling the construction of a new type of foundation models. In this paper, we present a unified view of these models, formulating such layers as implicit causal self-attention layers. The formulation includes most of their sub-components and is not limited to a specific part of the architecture. The framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods. Our experiments show that our attention matrices and attribution method outperform an alternative and a more limited formulation that was recently proposed for Mamba. For the other architectures for which our method is the first to provide such a view, our method is effective and competitive in the relevant metrics compared to the results obtained by state-of-the-art transformer explainability methods. Our code is publicly available.

arxiv preprint arxiv, mamba, matrix, (13 more...)

arXiv.org Artificial Intelligence

2405.16504

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Unlocking Pixels for Reinforcement Learning via Implicit Attention

Choromanski, Krzysztof, Jain, Deepali, Parker-Holder, Jack, Song, Xingyou, Likhosherstov, Valerii, Santara, Anirban, Pacchiano, Aldo, Tang, Yunhao, Weller, Adrian

arXiv.org Artificial IntelligenceFeb-8-2021

There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is a self-attention bottleneck, which provides a simple and effective framework for learning high performing policies, even in the presence of distractions. However, due to poor scalability of attention architectures, these methods do not scale beyond low resolution visual inputs, using large patches (thus small attention matrices). In this paper we make use of new efficient attention algorithms, recently shown to be highly effective for Transformers, and demonstrate that these new techniques can be applied in the RL setting. This allows our attention-based controllers to scale to larger visual inputs, and facilitate the use of smaller patches, even individual pixels, improving generalization. In addition, we propose a new efficient algorithm approximating softmax attention with what we call hybrid random features, leveraging the theory of angular kernels. We show theoretically and empirically that hybrid random features is a promising approach when using attention for vision-based RL.

exp, reinforcement learning, trig, (15 more...)

arXiv.org Artificial Intelligence

2102.04353

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback